4.3 P-Values, Confidence Regions

1 P-Values

Suppose $ϕ (X)$ rejects for large values of $T (X)$ . We can informally define p-value as the "under null hypothesis, probability that $T (X)$ is as large or larger than what we observed". I.e. $p (x) = P_{H_{0}} (T (X) \geq T (x)) = sup_{θ \in Θ_{0}} P_{θ} (T (X) \geq T (x)) .$

Example

Binomial: $X \sim Binomial (n, θ)$ . $H_{0} : θ \leq 0.5 vs H_{1} : θ > 0.5$ . One-sided test rejects for large $X$ : $p (x) = P_{H_{0}} (X \geq x) = sup_{θ \leq 0.5} P_{θ} (X \geq x) .$
Normal: $X \sim N (θ, 1)$ . $H_{0} : θ = 0 vs H_{1} : θ \neq 0$ . Two-sided test rejects for large $T (X) = | X |$ (i.e. $ϕ_{α} (x) = 1 {| x | > z_{α / 2}}$ ). The two-sided p-value is $p (X)$ where $p (x) = P_{0} (| X | > | x |) = 2 (1 - Φ (| x |)) .$

Now we give a formal definition:

P-Value

Given $P, Θ_{0}, Θ$ . Assume we have a test $ϕ_{α}$ for each significance level $sup_{θ \in Θ_{0}} E_{θ} ϕ_{α} (X) \leq α$ . (For non-randomized case, it's $ϕ_{α} = 1 {x \in R_{α}}$ )
Assume tests are monotone in $α$ : if $α_{1} \leq α_{2}$ , then $ϕ_{α_{1}} (x) \leq ϕ_{α_{2}} (x)$ . (For non-randomized case, it's $R_{α_{1}} \subset R_{α_{2}}$ )
Then p-value is $p (x) = sup {α | ϕ_{α} (x) < 1} (= sup {α : x \notin R_{α}}) .$

$ϕ$ here measures how "extreme" an observed $T (X)$ is.

For $θ \in Θ_{0}$ , $\begin{aligned} P_{θ} (p (x) \leq α) & = P_{θ} (sup {\tilde{α} | ϕ_{\tilde{α}} (x) < 1} \leq α) \\ \leq lim_{ε \to 0^{+}} P_{θ} (ϕ_{α + ε} (x) = 1) \\ \leq lim_{ε \to 0^{+}} (α + ε) = α . \end{aligned}$
So p-value stochastically dominates $u [0, 1]$ .
If $ϕ_{α}$ rejects for large $T (X)$ , reduces to original definition.

Note the p-value is defined based on

The model $P_{θ}$ and hypothesis $H_{0}$ ;
The data $X$
The choice of test.

Example

Exponential: $X \sim Exp (θ)$ . $H_{0} : θ = 1 vs H_{1} : θ \neq 1$ . We can use equal-tailed test or UMPU test. For $X > 1$ ,
- Equal-tailed test: $p (x) = 2 P_{1} (X \geq x) = 2 e^{- x} .$
- UMPU: $p (x) = α$ for which $c_{2} (α) = x$ .
Normal: $X \sim N_{d} (θ, I_{d})$ . $H_{0} : θ = 0 vs H_{1} : θ \neq 0$ . We can use $χ^{2}$ test $T_{1} (X) = | | X | |^{2}$ or maximum test $T_{2} (X) = | | X | |_{\infty} = max_{i} | X_{i} |$ .

2 Confidence Sets

2.1 Definition

Confidence Set

$C (X)$ is a $1 - α$ confidence set for $g (θ)$ if $P_{θ} (C (X) ∋ g (θ) \geq 1 - α), \forall θ \in Θ$ .
We say $C (X)$ covers $g (θ)$ if $C (X) ∋ g (θ)$ .
$P_{θ} (C (X) ∋ g (θ))$ is coverage probability.
$inf_{θ} P_{θ} (C (X) ∋ g (θ))$ is confidence level.

$C (X)$ is random, not $g (θ)$ . $g (θ)$ is a deterministic function of $θ$ .

Often misinterpreted as Bayesian guarantee.

say " $C (X)$ has a $95 %$ chance of covering", not " $g (θ)$ "

2.2 Duality of Testing & Confidence Sets

Suppose we have a level- $α$ test $ϕ (x; a)$ of $\begin{matrix} (2.1) & H_{0} : g (θ) = a vs H_{1} : g (θ) \neq a, \forall a \in g (Θ) . \end{matrix}$
We can use it to make a confidence set for $g (θ)$ :
Let $C (X) = {a | ϕ (x; a) < 1}$ (all non-rejected values of $θ$ ). Then $P_{θ} (C (X) ∌ g (θ)) = P_{θ} (ϕ (x; g (θ)) = 1) \leq α, \forall θ .$
Alternatively, suppose $C (X)$ is a $1 - α$ confidence set for $g (θ)$ . We can use $C$ to construct a test $ϕ (X)$ of (2.1): let $ϕ (X) = 1 {a \notin C (X)}$ . For $θ : g (θ) = a$ , $E_{θ} ϕ (X) = P_{θ} (C (X) ∌ g (θ)) \leq α .$ This is called inverting the test.

2.3 Confidence Interval for Median

For nonparametric model $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} F$ , ( $F$ is any c.d.f) Define $g (F) = median (F) = F^{- 1} (\frac{1}{2})$ . Consider two-sided test $\begin{aligned} H_{0} : g (F) = μ ⟺ F (μ) = \frac{1}{2} \\ vs & H_{1} : g (F) \neq μ ⟺ F (μ) \neq \frac{1}{2} . \end{aligned}$
Denote $S (X; μ) = # {X_{i} > μ} \sim Binomial (n, 1 - F (μ)) \overset{H_{0}}{=} \frac{1}{2}$ . Reject for $T (X; μ) = | S (X; μ) - \frac{n}{2} | > c_{α}$ . Then $\begin{aligned} μ \in C (X) ⟺ & | S (X; μ) - \frac{n}{2} | \leq c_{α} \\ ⟺ & # {X_{i} > μ} \in [\frac{n}{2} - c_{α}, \frac{n}{2} + c_{α}] \\ ⟺ & μ \in [X_{(\frac{n}{2} - c_{α})}, X_{(\frac{n}{2} + c_{α})}] . \end{aligned}$

3 Confidence Intervals/Bounds

If $C (X) = [C_{1} (X), C_{2} (X)]$ , we say $C (X)$ is a confidence interval (CI).

$C (X) = [C_{1} (X), \infty)$ is lower confidence bound (LCB).
$C (X) = (- \infty, C (X)]$ is upper confidence bound (UCB).

We usually get LCB/UCB by inverting a one-sided test in appropriate direction called uniformly most accurate (UMA) if test UMP. And get CI by inverting a two-sided test called UMAU if test is UMPU.

Example

$X \sim Exp (θ) = \frac{1}{θ} e^{- \frac{x}{θ}}, x > 0, θ > 0$ . C.d.f $P_{θ} (X \leq x) = 1 - e^{- \frac{x}{θ}}$ .

LCB: Invert test for $H_{0} : θ \leq θ_{0}$ . Solve for $α = P_{θ_{0}} (X > c (θ_{0})) = e^{- \frac{c (θ_{0})}{θ_{0}}},$ and we get $c (θ_{0}) = - θ_{0} \log α > 0$ . So $X \leq c (θ_{0}) \Rightarrow θ_{0} \geq - \frac{X}{\log α}$ , so $C (X) = [- \frac{X}{\log α}, + \infty)$ .
UCB: Similarly, $C (X) = (- \infty, - \frac{X}{\log (1 - α)}]$ .
Equal tailed CI: invert equal tailed test of $H_{0} : θ = θ_{0}$ : $ϕ_{α}^{ET} (X) = ϕ_{\frac{α}{2}}^{\geq θ_{0}} (X) + ϕ_{\frac{α}{2}}^{\leq θ_{0}} (X),$ and $\begin{aligned} C (X) & = [- \frac{X}{\log \frac{α}{2}}, + \infty) \cap (- \infty, - \frac{X}{\log (1 - \frac{α}{2})}] \\ = [- \frac{X}{\log \frac{α}{2}}, - \frac{X}{\log (1 - \frac{α}{2})}] . \end{aligned}$